ENVX1002 Introduction to Statistical Methods
The University of Sydney
Feb 2025
Tables: Experimental data
Tables: Observational data
Before we go calculate averages, we need to think about the difference between population and sample
bakhtiarzein - https://stock.adobe.com/
Google earth image with farm and soil-landscape boundaries
\Sigma_{i=1}^n=x_1+x_2+x_3+...+x_n
-
sum(c(48, 56, 78, 86, 90, 271))
-
=SUM(A1:A6)
\mu = \frac{\sum_{i=1}^{N} y_i}{N}
\overline{y} = \frac{\sum_{i=1}^{n} y_i}{n}
-
mean(c(48, 56, 78, 86, 90, 271))
-
=AVERAGE(A1:A6)
Population median: M=\left(\frac{N+1}{2}\right)th sorted value
Sample median: \tilde{y}=\left(\frac{n+1}{2}\right)th sorted value
-
median(c(48, 56, 78, 86, 90, 271))
-
=MEDIAN(A1:A6)
Mode is the most commonly occurring number in a set of observations
-
[1] 4
-
=MODE.SNGL(A1:A7)
-
max(c(48, 56, 78, 86, 90, 271)) - min(c(48, 56, 78, 86, 90, 271))
[1] 223
-
=MAX(A1:A6) - MAX(A1:A6)
Let’s take an easy example
1 2 3 4 5 6 7 8 9
What is Q1, Median, Q3?
Source: Nicholas (1999)
Quartiles
-
quantile(c(48, 56, 78, 86, 90, 271))
0% 25% 50% 75% 100%
48.0 61.5 82.0 89.0 271.0
-
=QUARTILE.INC(A1:A6, 1) - first quartile
IQR
-
IQR(c(48, 56, 78, 86, 90, 271))
-
=QUARTILE.INC(A1:A6, 3)-QUARTILE.INC(A1:A6, 1) - third quartile - first quartile
Population variance: \sigma^2 = \frac{\sum_{i=1}^{N}(y_i - \mu)^2}{N}
Sample variance: s^2 = \frac{\sum_{i=1}^{n}(y_i - \overline{y})^2}{n-1}
-
var(c(48, 56, 78, 86, 90, 271))
-
=VAR.S(A1:A6)
Population standard deviation: \sigma = \sqrt{\frac{\sum_{i=1}^{N}(y_i - \mu)^2}{N}}
Sample standard deviation: s = \sqrt{\frac{\sum_{i=1}^{n}(y_i - \overline{y})^2}{n-1}}
-
sd(c(48, 56, 78, 86, 90, 271))
-
=STDEV.S(A1:A6)
Soil nitrogen (%): 2 16 22 45 65 93
CV=\left(\frac{s}{\overline{y}}\right)\times{100}
-
sd(c(48, 56, 78, 86, 90, 271))
[1] 79.2604
[1] 84.10661
-
=(STDEV.S(A1:A6)/AVERAGE(A1:A6))*100
# Load necessary library
library(ggplot2)
# Your disease data
disease <- c("None", "Moderate", "None", "Severe", "Moderate", "Moderate", "Severe", "Moderate", "None", "Moderate")
# Order factors from no disease to severe disease
disease = factor(disease, levels = c("None", "Moderate", "Severe"))
# Create a frequency table
disease_tbl <- table(disease)
print(disease_tbl)disease
None Moderate Severe
3 5 2
# Convert the table to a data frame for ggplot2
disease_df <- as.data.frame(disease_tbl)
# Rename the columns appropriately
names(disease_df) <- c("Disease", "Frequency")
# Create the bar plot
p <- ggplot(disease_df, aes(x = Disease, y = Frequency)) +
geom_bar(stat = "identity") +
ggtitle("Frequency of Disease Categories") +
xlab("Disease Category") +
ylab("Frequency")tidyverse# Load necessary libraries
library(tidyverse)
# Your disease data
disease <- c("None", "Moderate", "None", "Severe", "Moderate", "Moderate", "Severe", "Moderate", "None", "Moderate")
# Convert to tibble and count occurrences
disease_data <- tibble(disease) %>%
mutate(disease = factor(disease, levels = c("None", "Moderate", "Severe"))) %>%
count(disease, name = "Frequency")
# Create the bar plot
ggplot(disease_data, aes(x = disease, y = Frequency)) +
geom_bar(stat = "identity") +
ggtitle("Frequency of Disease Categories") +
xlab("Disease Category") +
ylab("Frequency")Source: Weissgerber at al. (2015)
# Load necessary library
library(ggplot2)
# Your data
soil_c <- c(48, 56, 8, 86, 90, 271)
# Convert to a data frame
soil_c_df <- data.frame(Value = soil_c)
# Create the strip chart
p <- ggplot(soil_c_df, aes(x = "", y = Value)) +
geom_jitter(width = 0) +
ggtitle("Strip chart of soil carbon") +
xlab("") +
ylab("Soil carbon (t/ha)")Source: Nicholas (1999)
root_length <- c(108, 102, 100, 135, 113, 109, 92, 97, 73, 65,
68, 74, 93, 97, 118, 121, 103, 99, 90, 90,
99, 102, 106, 90, 92, 97, 100, 92, 80, 99,
103, 103, 115, 85, 96, 86, 85, 86, 91, 90,
94, 93, 93, 99, 109, 115, 110, 94, 107, 88,
101, 89, 117, 91, 112, 101, 91, 81, 80, 67,
69, 80, 86, 81, 65, 90, 99, 93, 90, 102,
72, 70, 90, 90, 87, 89, 90, 96, 108, 86)# Load necessary library
library(ggplot2)
# Convert to a data frame
root_length_df <- data.frame(Value = root_length)
# Create the strip chart
ggplot(root_length_df, aes(x = "", y = Value)) +
geom_boxplot() +
geom_jitter(width = 0.1, col = "red") +
ggtitle("Boxplot of root length in bentgrass") +
xlab("") +
ylab("Root length (mm)")g_1 = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left( \frac{y_i - \bar{y}}{s} \right)^3
skewness function found in the e1071 packageThis presentation is based on the SOLES Quarto reveal.js template and is licensed under a Creative Commons Attribution 4.0 International License.